Picture for Zhi Gao

Zhi Gao

School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, China

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning

Add code
May 06, 2025
Viaarxiv icon

Iterative Trajectory Exploration for Multimodal Agents

Add code
Apr 30, 2025
Viaarxiv icon

TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials

Add code
Apr 17, 2025
Viaarxiv icon

Building LLM Agents by Incorporating Insights from Computer Systems

Add code
Apr 06, 2025
Viaarxiv icon

MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge

Add code
Feb 27, 2025
Viaarxiv icon

Large-Scale Riemannian Meta-Optimization via Subspace Adaptation

Add code
Jan 25, 2025
Viaarxiv icon

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Add code
Dec 20, 2024
Viaarxiv icon

FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models

Add code
Jul 16, 2024
Viaarxiv icon

A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement

Add code
Mar 28, 2024
Figure 1 for A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement
Figure 2 for A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement
Figure 3 for A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement
Figure 4 for A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement
Viaarxiv icon

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Add code
Mar 18, 2024
Viaarxiv icon